Predicting site-specific human selective pressure using evolutionary signatures
نویسندگان
چکیده
MOTIVATION The identification of non-coding functional regions of the human genome remains one of the main challenges of genomics. By observing how a given region evolved over time, one can detect signs of negative or positive selection hinting that the region may be functional. With the quickly increasing number of vertebrate genomes to compare with our own, this type of approach is set to become extremely powerful, provided the right analytical tools are available. RESULTS A large number of approaches have been proposed to measure signs of past selective pressure, usually in the form of reduced mutation rate. Here, we propose a radically different approach to the detection of non-coding functional region: instead of measuring past evolutionary rates, we build a machine learning classifier to predict current substitution rates in human based on the inferred evolutionary events that affected the region during vertebrate evolution. We show that different types of evolutionary events, occurring along different branches of the phylogenetic tree, bring very different amounts of information. We propose a number of simple machine learning classifiers and show that a Support-Vector Machine (SVM) predictor clearly outperforms existing tools at predicting human non-coding functional sites. Comparison to external evidences of selection and regulatory function confirms that these SVM predictions are more accurate than those of other approaches. AVAILABILITY The predictor and predictions made are available at http://www.mcb.mcgill.ca/~blanchem/sadri. CONTACT [email protected].
منابع مشابه
Signatures of Natural Selection at the FTO (Fat Mass and Obesity Associated) Locus in Human Populations
BACKGROUND AND AIMS Polymorphisms in the first intron of FTO have been robustly replicated for associations with obesity. In the Sorbs, a Slavic population resident in Germany, the strongest effect on body mass index (BMI) was found for a variant in the third intron of FTO (rs17818902). Since this may indicate population specific effects of FTO variants, we initiated studies testing FTO for sig...
متن کاملUse of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.
Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited the...
متن کاملCpG Islands Undermethylation in Human Genomic Regions under Selective Pressure
DNA methylation at CpG islands (CGIs) is one of the most intensively studied epigenetic mechanisms. It is fundamental for cellular differentiation and control of transcriptional potential. DNA methylation is involved also in several processes that are central to evolutionary biology, including phenotypic plasticity and evolvability. In this study, we explored the relationship between CpG island...
متن کاملInfectious Disease and the Diversification of the Human Genome.
The human immune system is under great pathogen-mediated selective pressure. Divergent infectious disease pathogenesis across human populations combined with the overrepresentation of "immune genes" in genomic regions with signatures of positive selection suggests that pathogens have significantly altered the human genome. However, important features of the human immune system can confound sear...
متن کاملPredicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most ...
متن کامل